Wan VACE #11582

a-r-r-o-w · 2025-05-19T19:51:56Z

Checkpoints (temporary; only for the time being until official weights are hosted):

https://huggingface.co/a-r-r-o-w/Wan-VACE-1.3B-diffusers

T2V

import torch
from diffusers import AutoencoderKLWan, WanVACEPipeline
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from diffusers.utils import export_to_video

model_id = "/raid/aryan/diffusers-wan-vace-1.3b/"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanVACEPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
flow_shift = 5.0  # 5.0 for 720P, 3.0 for 480P
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=flow_shift)
pipe.to("cuda")

prompt = "A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"

output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=480,
    width=832,
    num_frames=81,
    num_inference_steps=30,
    guidance_scale=5.0,
    conditioning_scale=0.0,
    # conditioning_scale=1.0,
    generator=torch.Generator().manual_seed(0),
).frames[0]
export_to_video(output, "output.mp4", fps=16)

`conditioning_scale=0`	`conditioning_scale=1`
output2.mp4	output.mp4

I2V

import torch
import PIL.Image
from diffusers import AutoencoderKLWan, WanVACEPipeline
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from diffusers.utils import export_to_video, load_image


def prepare_video_and_mask(img: PIL.Image.Image, height: int, width: int, num_frames: int):
    img = img.resize((width, height))
    frames = [img]
    # Ideally, this should be 127.5 to match original code, but they perform computation on numpy arrays
    # whereas we are passing PIL images. If you choose to pass numpy arrays, you can set it to 127.5 to
    # match the original code.
    frames.extend([PIL.Image.new("RGB", (width, height), (128, 128, 128))] * (num_frames - 1))
    mask_black = PIL.Image.new("L", (width, height), 0)
    mask_white = PIL.Image.new("L", (width, height), 255)
    mask = [mask_black, *[mask_white] * (num_frames - 1)]
    return frames, mask


model_id = "/raid/aryan/diffusers-wan-vace-1.3b/"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanVACEPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
flow_shift = 5.0  # 5.0 for 720P, 3.0 for 480P
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=flow_shift)
pipe.to("cuda")

prompt = "An astronaut emerging from a cracked, otherworldly egg on the barren surface of the Moon—his form silhouetted against the stark lunar dust, as if being born into silence. The vast darkness of space looms behind, punctuated by distant stars, capturing the immense depth and isolation of the cosmos. The scene is rendered in ultra-realistic, cinematic detail, with dramatic lighting and a breath-taking, movie-like camera angle that evokes awe and mystery—blending themes of rebirth, exploration, and the uncanny."
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg")

height = 480
width = 832
num_frames = 81
video, mask = prepare_video_and_mask(image, height, width, num_frames)

output = pipe(
    video=video,
    mask=mask,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=num_frames,
    num_inference_steps=30,
    guidance_scale=5.0,
    generator=torch.Generator().manual_seed(42),
).frames[0]
export_to_video(output, "output.mp4", fps=16)

output.mp4

HuggingFaceDocBuilderDev · 2025-05-19T19:58:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w added 4 commits May 19, 2025 21:50

initial support

d15d090

make fix-copies

f9865ff

fix no split modules

32ab1c9

add conversion script

834bfc6

a-r-r-o-w and others added 5 commits May 20, 2025 21:35

refactor

ea301df

add pipeline test

4b14ddd

refactor

694dcf2

fix bug with mask

23f6bc1

Merge branch 'main' into integrations/wan-vace

4a4b058

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wan VACE #11582

Wan VACE #11582

a-r-r-o-w commented May 19, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented May 19, 2025

Wan VACE #11582

Are you sure you want to change the base?

Wan VACE #11582

Conversation

a-r-r-o-w commented May 19, 2025 • edited Loading

T2V

I2V

HuggingFaceDocBuilderDev commented May 19, 2025

a-r-r-o-w commented May 19, 2025 •

edited

Loading